weight decay 0
A Proofs
D.2 Countries Hyperparameters are summarized in table 6. We ran all experiments on a single CPU (Apple M2). 15 optimizer AdamW learning rate 0.0003 learning rate schedule cosine training epochs 100 weight decay 0.00001 batch size 4 embedding dimensions 10 embedding initialization one-hot, fixed neural networks LeNet5 max search depth / Table 5: Hyperparameters for the MNIST -addition experiments.
Masked Image Modeling Supplementary Material Anonymous Author(s) Affiliation Address email 1 More Training Details 1
We use the same setting for different sizes RevCol models on MIM pre-training. The hyper-parameters generally follow [4, 2]. Table 3 shows the detail training settings after MIM pre-training. We also show training settings on ImageNet-1K after ImageNet-22K fine-tuning. For semantic segmentation, we evaluate different backbones on ADE20K dataset.
- Europe > Netherlands > North Holland > Amsterdam (0.05)
- Asia > Middle East > Israel (0.05)
Rotary Masked Autoencoders are Versatile Learners
Zivanovic, Uros, Di Gioia, Serafina, Scaffidi, Andre, Rios, Martín de los, Contardo, Gabriella, Trotta, Roberto
Applying Transformers to irregular time-series typically requires specializations to their baseline architecture, which can result in additional computational overhead and increased method complexity. We present the Rotary Masked Autoencoder (RoMAE), which utilizes the popular Rotary Positional Embedding (RoPE) method for continuous positions. RoMAE is an extension to the Masked Autoencoder (MAE) that enables interpolation and representation learning with multidimensional continuous positional information while avoiding any time-series-specific architectural specializations. We showcase RoMAE's performance on a variety of modalities including irregular and multivariate time-series, images, and audio, demonstrating that RoMAE surpasses specialized time-series architectures on difficult datasets such as the DESC ELAsTiCC Challenge while maintaining MAE's usual performance across other modalities. In addition, we investigate RoMAE's ability to reconstruct the embedded continuous positions, demonstrating that including learned embeddings in the input sequence breaks RoPE's relative position property.
- Europe > Austria > Vienna (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > California > Los Angeles County > Long Beach (0.14)
- (16 more...)